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Soft-Margin Support Vector Machine 


Roadmap 


@ Embedding Numerous Features: Kernel Models 





Lecture 3: Kernel Support Vector Machine 


kernel as a shortcut to (transform + inner product) to 

remove dependence on d: allowing a spectrum of 
simple (linear) models to infinite dimensional 

(Gaussian) ones with margin control 










Lecture 4: Soft-Margin Support Vector Machine 
ə Motivation and Primal Problem 

ə Dual Problem 

e Messages behind Soft-Margin SVM 

ə Model Selection 
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Soft-Margin Support Vector Machine Motivation and Primal Problem 


Cons of Hard-Margin SVM 


recall: SVM can still overfit :-( 





























x S (e) 

x (œ 

e part of reasons: ® aN 

e other part: separable Š 
Do EN 

p, 


if always insisting on separable (==> shatter), 
have power to overfit to noise 


Soft-Margin Support Vector Machine Motivation and Primal Problem 


Give Up on Some Examples 


want: give up on some noisy examples 


hard-margin SVM 





N 
: R 1 
min sign(w’z, + b i —w! 
pir z [yn # sign(w'zn + b)] | min = 5w'w 





St ` Yp(w'z, + b) > 1 forall n 





N 
I TA d 17 T 

combination: min aw w+C. z [yn # sign(W' Zn + b)| 
St ` Yp(w!z,) + b) > 1 for correct n 


Koll Su + b) > —co for incorrect n 





C: trade-off of large margin & noise tolerance 
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Soft-Margin Support Vector Machine Motivation and Primal Problem 


Soft-Margin SVM (1/2) 
min Leiw SE e [yn 4 sign(w'z, + b)] 


b,w 
n=1 


Yn(w'2, + b) > 1 — œ- [yn 4 sign(w’z, + DI 





e LI: non-linear, not QP anymore :-( 
—what about dual? kernel? 

e cannot distinguish small error (slightly away from fat boundary) 

or large error (a...w...a...y... from fat boundary) 









e record ‘margin violation’ by €,—linear constraints 

e penalize with margin violation instead of error count 
— quadratic objective e 
soft-margin SVM: min iww +C: 2 Ên 


St yn(WTZ, + b) > 1 — ¿n and €, > 0 for all n 
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Soft-Margin Support Vector Machine Motivation and Primal Problem 


Soft-Margin SVM (2/2) 


e record ‘margin violation’ by £n 
e penalize with margin violation 


N 
dt who 
IIe A De 


n=1 


st. Yn(W'2Z_+b) > 1 — ¿n and ĉn > O for all n 











e parameter C: trade-off of large margin & margin violation 
e large C: want less margin violation 
e small C: want large margin 


e QP of d +1 + N variables, 2N constraints 


next: remove dependence on d by 
soft-margin SVM primal = dual? 





Motivation and Primal Problem 


Soft-Margin Support Vector Machine 
Fun Time 


At the optimal solution of 


N 
Lee 
min 5w OË 


DW n=1 
SL ` yp(w'z, +b) > 1 — £n and ¿n > 0 for all n, 
assume that yı (w!2, + b) = —10. What is the corresponding ¢;? 
Q 1 
@ 11 


© 21 





6 31 


Motivation and Primal Problem 


Fun Time 


Soft-Margin Support Vector Machine 


At the optimal solution of 


N 
‘Lee 
min —w wc. 
bwg 2 j d én 


SL ` yp(w'Z, +b) > 1 — ¿n and ¿n > 0 for all n, 
assume that y;(w’z,; + b) = —10. What is the corresponding £? 
© 1 
© 11 


© 21 
6 31 











Reference Answer: A 


é is simply 1 — y4 (w"z4 + b) when 
yı (WTZ; + b) <1. 
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Soft-Margin Support Vector Machine Dual Problem 


Lagrange Dual 


N 
1 

rimal: min —w!’w : 

primal: min 5 +C-S én 


n=1 


st. Yp(w!Z_) + b) > 1 — ¿n and £n > 0 forall n 





Lagrange function with Lagrange multipliers a, and 6, 







N 
1 
L(b,w,£,0,8) = WWC Y én 
nj 


N N 
+> Qn: (1 == Yn(w' Zp, ale b)) E N Bn: (—&n) 
n=1 n=1 
want: Lagrange dual 


max min L(b,w, €,a, 
an>0, Bn>=0 (min ( Š ai 





Soft-Margin Support Vector Machine Dual Problem 


Simplify €, and 8n 


N 
: 1 7 
max min -W W C3 
an>0, Bn>O Lon 2 E $n 


=I 


N 
(Wzn ag b)) SR Kä Bn H (—£n) 


n=l 


N 
+) an: (1-1 -Yn 
n=1 









OEn 


e no loss of optimality if solving with implicit constraint 6, = C — ay 
and explicit constraint 0 < an < C: Bn removed 


NEE C—an-Bn 





€ can also be removed :-), like how we removed b 


b,w,€ 


N 
ege T 
be as ( min aww + KS oral! — yn(WTZn + b)) 


n=1 





Dual Problem 


Other Simplifications 


Soft-Margin Support Vector Machine 


N 
ep tiie: ? 
roe aa ae (z zW W+ 2 tl a Yn(w Zat eil 








familiar? :-) 
inner problem same as hard-margin SVM 


Ep 
0: no loss of optimality if solving with constraint X` anyn = 0 
n=1 


0: no loss of optimality if solving with constraint 


Oe 


Ow; 
N 
WwW = QAnYnZn 


n=l 





standard dual can be derived 
using the same steps as Lecture 2 
E e en an] 222 
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Soft-Margin Support Vector Machine Dual Problem 


Standard Soft-Margin SVM Dual 


N 


N N 
min E 3 ` aart fn — 3 On 
1 m=1 nÍ 


subject to SS Ynan=0 
OS on Corn Ee 
implicitly w= z OnYnZn; 
Gp, = C— a, onn: 162)..5,N 
—only difference to hard-margin: upper bound on an 


another (convex) QP, 
with N variables & 2N + 1 constraints | 


Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22 





Soft-Margin Support Vector Machine Dual Problem 


Fun Time 


In the soft-margin SVM, assume that we want to increase the 
parameter C by 2. How shall the corresponding dual problem be 
changed? 


© the upper bound of apn shall be halved 

@ the upper bound of an shall be decreased by 2 
© the upper bound of an shall be increased by 2 
© the upper bound of an shall be doubled 





Soft-Margin Support Vector Machine Dual Problem 


Fun Time 


In the soft-margin SVM, assume that we want to increase the 
parameter C by 2. How shall the corresponding dual problem be 


changed? 
© the upper bound of apn shall be halved 
@ the upper bound of an shall be decreased by 2 
© the upper bound of an shall be increased by 2 
© the upper bound of an shall be doubled 


Reference Answer: G) 


Because C is exactly the upper bound of an, | 
increasing C by 2 in the primal problem is 
equivalent to increasing the upper bound by 2 

in the dual problem. 
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Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM 


Kernel Soft-Margin SVM 
Soft-Margin M Algorithm 
ER dom = Yn¥mK (Xn, Xm); P = —1 n; (A, c) for 

equ./lower-bound/upper-bound constraints 
@ a <— QP(Qo, p, A, c) 
Obe? 
®© return SVs and their œn as well as b such that for new x, 
Jsvn(X) = sign ( 3 anynK(Xn, X) + b) 


SV indices n 









e almost the same as hard-margin 


e more flexible than hard-margin 
—primal/dual always solvable 





remaining question: step (Gi 


Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM 


Solving for b 


hard-margin SVM 


complementary slackness: 
an(1 — Yn(W! Zn +b))=0 


soft-margin SVM 
complementary slackness: 
an(1 — én — yn(w' Zp + b)) =0 
(C —an)fn = 0 
e SV (as > DI 
=> b = Ys WI, 


e SV (as > 0) 

=> b = Ys — Kate — W' Zs 
e free (as < C) 

ZE 0 





solve unique b with free SV (Xs, ys): 







b = Ys Ge `> oooh (Xn, Xs) 


SV indices n 


—range of b otherwise 
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Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM 


Soft-Margin Gaussian SVM in Action 





























e large C => less noise tolerance => ‘overfit’? 
e warning: SVM can still overfit :-( 





soft-margin Gaussian SVM: 
need careful selection of (7, C) 
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Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM 


Physical Meaning of ap, 
complementary slackness: 
(C = an)En = 0 













e non SV (0 = ap): En = 0, 
‘away from’/on fat boundary 
e DO free SV (0 < an < C): én =0, 
on fat boundary, locates b 
e A bounded SV (an = C): 
¿n = violation amount, 
‘violate’/on fat boundary 

















on Can be used for data analysis | 
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Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM 


Fun Time 


For a data set of size 10000, after solving SVM, assume that there are 
1126 support vectors, and 1000 of those support vectors are bounded. 
What is the possible range of Ein(gsvm) in terms of 0/1 error? 


ER 0.0000 < Ein(Qsvm) < 0.1000 
@ 0.1000 < Ein(9svm) < 0.1126 
© 0.1126 < Ein(Gsvm) < 0.5000 
© 0.1126 < Ein(gsvm) < 1.0000 





Soft-Margin Support Vector Machine Messages behind Soft-Margin SVM 


Fun Time 


For a data set of size 10000, after solving SVM, assume that there are 


1126 support vectors, and 1000 of those support vectors are bounded. 


What is the possible range of Ein(Qsvm) in terms of 0/1 error? 
@ 0.0000 < Ein(gsvm) < 0.1000 
@ 0.1000 < Ein(gsvm) < 0.1126 
© 0.1126 < Ein(gsvm) < 0.5000 
© 0.1126 < Ein(gsvm) < 1.0000 


Reference Answer a 


The bounded support vectors are the only 

ones that could violate the fat boundary: 

En > 0. If Ep > 1, then the violation causes a 

0/1 error on the example. On the other hand, it 
is also possible that £» < 1, and in that case 

the violation does not cause a 0/1 error. d 
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Soft-Margin Support Vector Machine Model Selection 


Practical Need: Model Selection 





e complicated even for (C, +) 
of Gaussian SVM 

e more combinations if 
including other kernels or 
parameters 
































how to select? validation :-) | 
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Soft-Margin Support Vector Machine Model Selection 


Selection by Cross Validation 


0.3500 0.3250 0.3250 e Ewv(C, 7): ‘non-smooth’ 


function of (C, y) 
Rig ZKE 2645 SET E EE 


e proper models can be 


0.2000 0.2250 0.2750 chosen by V-fold cross 
Exe Kë eh validation on a few grid 


values of (C,7) 





0.1750 0.2250 0.2000 


E e e 


E: very popular criteria for soft-margin SVM 
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Soft-Margin Support Vector Machine Model Selection 


Leave-One-Out CV Error for SVM 


recall: Eioocy = Ecv with N folds 








claim: Ejoocy < ae 
e for (Xy, Yn): if optimal ay = 0 (non-SV) 
=> (a1, Q2,...,an_1) Still optimal when 


leaving out (Xy, Yn) 
key: what if there’s better an? 


e SVM: g` = g when leaving out non-SV 








Siero E OY) motivation from 
= err(g,non-SV) = 0 hard-margin SVM: 


= only SVs needed 





scaled #SV bounds leave-one-out CV error 
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Model Selection 


Selection by # SV 


Soft-Margin Support Vector Machine 


38- 3p 3p e nSV(C, 7): ‘non-smooth’ 
function of (C, y) 


E G 
i F 2 Zi 2 és —difficult to optimize 
e just an upper bound! 
e dangerous models can be 


27 21 12 
Pe oe ruled out by nSV on a few 
Gg [User Rees grid values of (C, 7) 

21 18 19 


Pee | a Diese? 


nSV: often used as a safety check if 
computing Eev is too time-consuming | 
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Soft-Margin Support Vector Machine Model Selection 


Fun Time 


For a data set of size 10000, after solving SVM on some parameters, 
assume that there are 1126 support vectors, and 1000 of those 
support vectors are bounded. Which of the following cannot be Ejgoey 
with those parameters? 


@ 0.0000 
© 0.0805 
© 0.1111 
© 0.5566 





Soft-Margin Support Vector Machine Model Selection 


Fun Time 


For a data set of size 10000, after solving SVM on some parameters, 
assume that there are 1126 support vectors, and 1000 of those 
support vectors are bounded. Which of the following cannot be Ejgoey 


with those parameters? 
@ 0.0000 
@ 0.0805 
© 0.1111 
© 0.5566 


Reference Answer: (4) 


Note that the upper bound of Ejggcy is 0.1126. 
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on 


Summary 





@ Embedding Numerous Features: Kernel Models 


Lecture 4: Soft-Margin Support Vector Machine 


ə Motivation and Primal Problem 
add margin violations ¿n 
ə Dual Problem 
upper-bound a, by C 
e Messages behind Soft-Margin SVM 
bounded/free SVs for data analysis 
ə Model Selection 
cross-validation, or approximately nSV 





e next: other kernel models for soft binary classification 
@ Combining Predictive Features: Aggregation Models 
© Distilling Implicit Features: Extraction Models 





